-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use squeeze instead of gsub in SquishNormalizer #53
Conversation
@@ -2,7 +2,7 @@ module AttributeNormalizer | |||
module Normalizers | |||
module SquishNormalizer | |||
def self.normalize(value, options = {}) | |||
value.is_a?(String) ? value.strip.gsub(/\s+/, ' ') : value | |||
value.is_a?(String) ? value.strip.squeeze : value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
irb> " hello\t world".strip.gsub /\s+/, ' '
=> "hello world"
irb> " hello\t world".strip.squeeze
=> "helo\t world"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're making an excellent point: the behavior of this normalizer regarding tabs & newlines isn't specified (only tested with normal spaces in specs), nor clearly documented:
- :squish Will strip leading and trailing whitespace and convert any consecutive spaces to one space each
I was personally using this normalizer on varchar
columns (usually represented with a simple <input>
field, thus not really facing the tab/newline problem), but as soon as I went using it on text
columns (with <textarea>
s), I realized it was dropping newlines, a behavior that didn't suit my needs (basic simple_format
), and that I personally found surprising, and deceptive according to the documentation.
IMHO, either the documentation & specs have to be updated accordingly (which makes this normalizer almost useless for multilines attributes), or the implementation has to be modified to follow the documentation (which will not be backward compatible).
BTW, maybe another normalizer could be introduced for multiline attributes, to remove all control characters except for newlines:
config.normalizers[:remove_control_characters] = lambda do |value, options|
value.try(:gsub, /[[:cntrl:]&&[^\n]]/, '')
end
Anyway, I'd be happy to help :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also:
irb> " hello\t world".strip.squeeze
=> "helo\t world"
Note the missing l
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry about that I've totally missed the ' '
argument and thought the missing l
was a typo.
I've corrected this in 748e2bd which is included in the PR.
I've also updated the original benchmarks to reflect this change
…ing spec to avoid regression
ActiveSupport provides a I am not in favor of changing this normalizer to use |
Oh, sorry about that, I've never really came across this ActiveSupport |
According to these benchmarks, using
squeeze
is an order of magnitude faster thangsub
when dealing with squishing/squeezing spaces on almost all Ruby implementations, except on Rubinius where it doesn't make much of a difference.Here is a patch that replaces
gsub
bysqueeze
inSquishNormalizer.normalize
, with all tests green :)